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f"*) Abstract 

The concept of data depth in non-parametric multivariate descriptive statistics is the gen- 
eralization of the univariate rank method to multivariate data. Halfspace depth is a measure 
of data depth. Given a set S of points and a point p, the halfspace depth (or rank) of p is 
defined as the minimum number of points of S contained in any closed halfspace with p on 
its boundary. Computing halfspace depth is NP-hard, and it is equivalent to the Maximum 
y—i Feasible Subsystem problem. In this paper a mixed integer program is formulated with the 

big-M method for the halfspace depth problem. We suggest a branch and cut algorithm for 

O these integer programs. In this algorithm, Chinneck's heuristic algorithm is used to find an 
upper bound and a related technique based on sensitivity analysis is used for branching. Irre- 
ducible Infeasible Subsystem (IIS) hitting set cuts are applied. We also suggest a binary search 
^2 algorithm which may be more numerically stable. The algorithms are implemented with the 

Q BCP framework from the COIN-OR project. 

1 Introduction 

m 

Halfspace depth is a measure of data depth which is a term from non-parametric multivariate 
descriptive statistics. The depth or rank of a point (a tuple data item) measures the centrality of 
this point with respect to a given set of points in a high dimensional space. The data with the 
largest depth is considered the median of the data set, which best describes the data set. Since 
r— I tuple data items can be represented as points in Euclidean space ]R d , these two terms are used 

interchangeably in this paper. 

Halfspace depth is also called Tukey depth or location depth. Given a set S of points and a 
k*" point p in M. d , the halfspace depth of p is defined as the minimum number of points of S contained 

in any closed halfspace with p on its boundary. The point with the largest depth is called halfspace 
median or Tukey median. In Figure [l] the halfspace depth of p is 3, because at least three points 
will be contained by any closed halfspace with p on its boundary. The halfspace depth of p can also 
be written as: 

min \{q £ Six ■ q < x ■ p}\ (1) 

where x is the outward normal vector of the closed halfspace. This optimization problem is equiv- 
alent to 

\S\ -max \{qe S\x ■ q > x ■ p}\ (2) 



*This research was partly supported by NSERC Canada. Computational Resources are supplied by ACEnet. 
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Figure 1 : An example of halfspace depth in M 2 



When a point is excluded from the halfspace, the corresponding inequality in ([2]) is satisfied. The 
problem is thus to find a vector x that maximizes the number of satisfied inequalities. 

A data set is said to be in general position if it has no ties, i.e. no more than two points on the 
same line, no more than three points on the same plane, and so forth. If the data set is in general 
position, the halfspace depth problem is identical to the open hemisphere problem introduced by 
Johnson and Preparata. Given a set of n points on the unit sphere S d in M. d , the open hemisphere 
problem is to find an open hemisphere of S d that contains as many points as possible. This problem 
is NP-complete if both n and d are parts of the input [TB] • 

1.1 Organization of This Paper 

This paper is organized as follows. In Section [2] we discuss on application of halfspace depth in 
non-parametric statistics. In Section [3] we show that the halfspace depth problem is equivalent to the 
maximum feasible subsystem (MAX FS) problem. In Section H we discuss different integer program 
formulations for the halfspace depth problem. In Section [5] we introduce Chinneck's heuristic 
algorithm for the MAX FS problem. In Section [6] we introduce our branch and cut algorithm. 
Unfortunately, in some circumstances this algorithm encounters numerical difficulties. For a more 
accurate result, we also introduce a binary search strategy in this section. In Section [7] we describe 
the details of our implementation using BCP; the BCP framework is also briefly introduced in this 
section. In Section[8]we give some testing results and benchmark the performance of our algorithm. 
In Section [9] we list some future work for this project. 

2 Application: Two Factor ANOVA 

In this section we describe an application of halfspace depth to ANOVA (ANalysis Of VAriance) 
testing. For more on the use of halfspace depth in this context, see [UlEO]. We consider here two 
factor ANOVA testing, where an experiment has two different experimental factors with levels in 
N = { 1 . . . n} and M = { 1 . . . m }. For example we may have n soil types and m fertilizers, and we 
wish to see how various combinations affect crop yield. For each experimental setting we have 
r data points ■ ■ ■ Zi,j> measuring outcomes. The subset { Zi j,k \ k — 1 . . . r } corresponding 
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to an experimental scenario is fit to (i.e. approximated by) a linear function fj,j + i/j. Our space 
of possible fits can be be identified with R n+m via the parameter vector i? = (hi . . . /x n , v\ . . . u m ). 
The quality of a given fit is evaluated using criteria! functions, typically using squared Euclidean 
distance like the following: 

fii,j,k{v) o ' 

It turns out that halfspace depth can be used to evaluate the local optimality of a given fit ■&. 
If there is another fit in the neighbourhood of i? that improves (i.e. decreases) every criterial 
function, then $ is clearly not optimal. To quantify this observation, we want to measure, over all 
possible directions of perturbation for the maximum number of criterial functions that decrease. 
In terms of the gradients (i.e. vectors of partial derivatives) VFij^fi?) of our criterial functions, 
the change in ,• ^ as we perturb in direction v is given by the inner product Vi^j • v. Since 
we want to decrease criterial functions, we are looking for a direction v such that the corresponding 
linear halfspace v T x > contains as few of the V -Fij.fc (i?) as possible. This is equivalent to asking 
for the halfspace depth of the origin in the set of gradients (for fixed fit the gradients will be 
constant vectors in K ra+m ). In our case the gradients have a simple form 

VFy^fl?) = ~(Zi,j,k - (M - Mj)( e i + e j) 

Since we only care about the sign, we may scale the gradients arbitrarily and consider instead the 
{ 0, ±1 } vectors 

Gi,j,k(&) = signVi^fc 

= -sign(z ij;fc - ^ - Hj){ei + e 3 ) 



We give computational results for some randomly generated examples of this type in Section 8.2 



3 Maximum Feasible Subsystem 

The halfspace depth problem has a strong connection with the Maximum Feasible Subsystem 
(MAX FS) problem. Given an infeasiblc linear system, the MAX FS problem is to find a maximum 
cardinality feasible subsystem. This problem is NP-hard [8, 23 , and also hard to approximate [3]. 

When p is contained in the convex hull of S, and p is on the boundary of a closed halfspace, 
as illustrated in Figure [l] there must be some data contained by the halfspace. Then the set of 
inequalities 

x ■ q > x ■ p \fq G S (3) 

or 

x ■ (q - p) > o y q e S (4) 

in ([2]) can not all be satisfied at the same time, i.e. Q is an infcasible linear system. Now it is 
clear that to compute the halfspace depth of p is to find the maximum number of inequalities in 
Q that can be satisfied at the same time; in other words to find the maximum feasible subsystem 
of Q. Therefore, the halfspace depth problem is a MAX FS problem. The MAX FS problem can 
also be seen as finding a minimum cardinality set of constraints, whose removal makes the original 
infcasible system feasible. This problem is called the minimum unsatisfied linear relation (MIN 
ULR) problem [TT]. 
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3.1 Irreducible Infeasible Subsystems 



In an infeasible linear system, an irreducible infeasible subsystem (IIS) is a subset of constraints 
that itself is infeasible, but any proper subsystem is feasible. If a subset of points A of S forms a 
minimal simplex which contains p, the inequalities in Q defined by A form an IIS. The point set A 
is a minimal dominating set (MDS), which is is a minimal set of points whose convex hull contains 



Every infeasible system contains one or more IISs. To make the original system feasible, we 
need to delete at least one inequality from every IIS, in other words, we need to delete a hitting set 
of all IISs in the infeasible system. The minimum- cardinality IIS set-covering (MIN IIS COVER) 
problem is to find the smallest cardinality set of constraints to hit all IISs of the original system (this 
problem is a minimum hitting set problem in the terminology of e.g. |13j . although it is called a set 
cover problem in [TTJ [2T]). The MIN IIS COVER set (hitting set) is the smallest set of constraints 
whose removal makes the original infeasible system feasible. Hence, the MIN IIS COVER problem 
is identical to the MIN ULR problem, and hence the MAX FS problem. 

Parker gives a method for the MAX FS problem in [5T], and Pfetsch further develops this 
method in |22j . Due to the fact that the infeasible system could contain an exponential number of 
IISs with respect to the number of constraints and the number of variables [S], the main idea of 
Parker's method is finding a subset of IISs in the whole problem and solving an integer program to 
find a minimum hitting set in each iteration. If the hitting set hits all IISs in the original infeasible 
system, the optimal solution is found. If not, find some IISs that are not hit by the current hitting 
set, then find (with an integer program) a new minimum hitting set that also hits the new IISs. 

An important part of this method is finding IISs. Given a linear system Ax > b, where A S K mXn 
and b € M. m , the following polyhedron: 



is defined as the alternative polyhedron. Each vertex of P corresponds to an IIS in the original 
infeasible system [T31 H3 [5TJ [55] . More precisely, the index set of non-zero supports of a vertex 
corresponds to an IIS. 

4 Mixed Integer Program (MIP) Formulation 

Parker suggests two integer program formulations for the MIN IIS COVER problem in [2Tj . 
One is applying the big-M method (see [5T] and [25]) to the inequalities in the infeasible system, 
and the other is based on the IIS inequalities. Suppose we have a group of data {A\, A2, . . . , A n } 
and a point A p in Euclidean space R d , and x is the normal vector of the halfspace that defines 
the halfspace depth of A p . Finding the halfspace depth of A p is equivalent to finding the MIN IIS 
COVER T of the following system: 



p®. 



P = {yeR m \y T A = 0,y T b=l,y>0} 



(5) 



,1 





. ,n} 



(G) 



i=i 



The depth of A p is |r|. To simplify Q , 



we call the vector (A,- 



A p ) aj. Then we rewrite (|6| as 



ajX > 



Vje {l,2,...,n} 



(7) 
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In HU, Parker treats the MIN IIS COVER problem with the IIS inequalities. MIN IIS COVER 
is a minimum hitting set problem, and the hitting set has at least one constraint in common with 
every IIS in the infeasible system. For an IIS C in we can use the binary variables associated 
with the constraints in C to formulate an inequality like 

E s '^ ( g ) 

tec 

where St is the binary variable associated with constraint t in ([7]). Using the IIS inequalities, a 
hitting set integer program is formulated in the following form: 



minimize y. s 

i=i 



£■ 

i=i 

subject to > 1 VC (IIS of system Q) (9) 

Si G {0,1} V»e{l,2,...,n} 

As we mentioned in Section [3T| Parker's strategy is to first find a small set of IISs and formulate an 
integer program (a sub-program of ([9])). After obtaining the optimal solution to the initial integer 
program, find some IISs that are not hit by the solution, add the corresponding IIS inequalities 
into the integer program and resolve it. The process stops when the solution hits all IISs in the 
infeasible system. This is easy to verify since it means exactly that the remaining system is feasible. 

To formulate an integer program with the big-M method, the strict inequalities in Q need 
to be transformed into non-strict ones. From ([7]), we can derive the following possibly infeasible 
system: 

a 3 x>e Vj € {1,2,..., n} (10) 

where e is a small positive real number. A mixed integer program can be formulated with the big-M 
method as follows: 

a 

minimize Sj 

j=i 

subject to ajX + sjM > e Vj G {1,2, ... ,n} (11) 

G {0,1} Vje{l,2,...,rc} 



s 



— oo < Xi < +oo Vi € {1, 2, . . . , d} 

Fixing the binary variable Sj to 1 has the effect of removing constraint j from ^ . The objective 
function is to minimize the number of constraints that have to be removed in order to find a 
feasible subsystem of For the general MIN IIS COVER problems, the big-M method may 
not be practical. As Parker and Pfetsch mentioned, the big-M should be big enough to make the 
infeasible system feasible, but if it is too big, this will cause numerical problems (see [3T] for details). 
In this paper we investigate the big-M method for the halfspace depth problem. In this formu- 



lation, it is easy to find a value for M to make (111 feasible, but the value of M should be large 



enough to guarantee an accurate result. For example, if M is assigned to e, (111 will be feasible 



for x — 0, but the optimal solution will not be the MIN IIS COVER of (11 1 because all the binary 



variables will be forced to 1 . Now we describe our methods to choose values for e and M in (111 
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4.1 Fixing the parameter M 



Since we only care about the sign of djX, instead of using infinity, we can choose an arbitrary 
bound — c < Xi < c for each element of vector x, where c is a constant. From the definition of inner 
product, we can get the following observation: 

i-«=||a:||.||g||.cosa<||a:||-|M| (12) 

Suppose point q max is the vector with the largest norm in the data set, then M can be set to the 
value of V d ■ c 2 ■ \\q m ax\\- Actually, we can assign a different value to M in each constraint, namely 
V d ■ c 2 • ||<?|| in the constraint associated with vector q. Even further, we can scale our vectors q, 
and make all data have the same norm. 



4.2 Fixing the parameter e 

Now let us consider the value of e. When computing the halfspace depth, maximizing the 
number of points contained in an open halfspace is the same as maximizing the number of points 



contained in a cone with solid angle less than n. For instance, if the open halfspace in Figure 2(a) 
is replaced by a cone with angle close enough to tt (see Figure |2(b)[ ) , we can still have the same 
depth value for p. 




Figure 2: Halfspace depth defined by an open halfspace and a cone 

Let the angle between the boundary of the cone and the halfspace be 9, x be the outward normal 
vector of the halfspace, q be a point contained by the cone, and a be the angle between x and q 
(interpreting q as a vector, and p as the origin) . The definition of the inner product tells us 

x-q = \\x\\ ■ \\q\\ • cosa = ||x|| • ||?|| • sin(| - a) > \\x\\ ■ \\q\\ ■ sin6> (13) 

where the last inequality is illustrated in Figure |2(b)| Suppose point q m in is the point with the 



smallest norm in the data set; then, no matter what value x is in the optimal solution of (11 1, the 
following inequality is satisfied for any point q in the data set. 

x- q> \\x\\ ■ \\q mi „\\ -sin9 (14) 



G 



Now the question is how to find the 9. Given a value for 9, e can be set to the value of 

yd - c 2 ■ \\q m in\\ -sm9 . 

The value of 9 can be bounded using the theory of integer lattices. In this argument all the data are 
considered as integral data (fractional data can be scaled up to integral data), i.e. the input data set 
S will be a subset of the integer lattice [T5] • The following theorem is due to Achill Schiirmann [2] . 

Theorem 4.1. Suppose points {Xi, X 2 , ■ ■ ■ , Xd} are affinely independent in Z d , | < m , (i,j — 
1,2,... ,d). Let H be an affine hull of {X\,X 2 , ■ ■ ■ , Xd}, and H does not contain the origin O. 
Then we can have the following bound for the distance from O to H , 

dist (H, O) > (2mVd)- (d - 1} . (15) 

Proof. Let 

I := Xi + 1(X 2 -Xx) + ... + Z{X d - X x ) 

be a lattice of Z d within H. Let Iq ■= H n Z d , then we have I C / . The distance h of H to a 
parallel plane containing lattice points is 

detZ d 1 



det Iq det Iq 

Then dist (H, O) > h > de ^ . Since detZ/det^o € N, we have det/ > det/o- Therefore, we 
have dist (H, O) > Let C m := {x € M d : | < m}. According to Hadamard's inequality, 

we have detZ < 11^=211^ — ^lll- Since — Ai|| < diameter(C m ) = 2my / d, dist (H, O) > 
(2mVd)-^ d - 1 l □ 



Let us now return back to the idea of halfspace depth defined by a cone (see (111). As shown in 
Figure [3j a distance h defines a cone, and p corresponds to the origin O in the former paragraph. 
The value of sin 9 will be at least /i/radius(C). 

When the dimension is high, such as 20, the value of e based on this lattice idea would be too 
small to be useful in practice. In our testing, we just set e to a very small value. If e is not small 
enough, it will have the same effect as M not being big enough (see page [5]) . 



5 A Heuristic Algorithm 

Chinneck [HUH] suggests a heuristic algorithm for the MIN IIS COVER problem. As discussed in 
Section [3] it is also an algorithm for the halfspace depth problem. This algorithm is based on several 
observations of elastic programming (originally a method for solving integer programs [7 according 
to Chinneck). In elastic programming, every constraint is elasticized by adding a non-negative 
elastic variable. Chinneck gives the following rules: the constraints in the form of J~^j aij%j ^ °ii 
Y^j a ij x j < h, or J^j a ij x j = h, become £\ a^Xj + e t >bi, J2j a ij x j ~ < k, or £\ a^Xj + e • - 
e" = bi respectively. The elastic objective function is to minimize the sum of the elastic variables, 
which is similar to phase 1 of the two phase simplex method |12j . After elasticizing, the original 
infeasible system becomes feasible, and the optimal solution will give some information about the 
infeasibility in the original system. This elastic programming is also similar to the big-Af method. 
In the big-M method, a set of binary variables with a large coefficient are used to make the infeasible 
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Figure 3: Lattice 



system feasible. When the optimal solution of the elastic program is found, the optimal value of the 
objective function is called the sum of the infeasibility (SINF). A nonzero elastic variable indicates 
a violated constraint in the original model, and the number of the nonzero variables is called the 
number of infeasibility (NINF). As Chinneck observed, the MIN IIS COVER problem is the problem 
to minimize NINF. At the optimal point, the value of an clastic variable is called the constraint 
violation of the corresponding constraint in the original model. The reduced cost of the slack or 
surplus variable is called the constraint sensitivity of the corresponding constraint, which, in fact, 
is the shadow price of that constraint. The shadow price of a constraint indicates how much the 
objective value of the optimal solution will be changed by changing the right hand side of the 
constraint by one unit. For more details about elastic programming, please refer to 1 OJ . 

The most important observation given by Chinneck is that SINF will be reduced more by 
eliminating a constraint in the MIN IIS COVER. For a violated constraint in the original model, 
the drop of the SINF can be estimated by (constraint violation) x | (constraint sensitivity) | . For an 
unviolated constraint, the drop can be estimated by | (constraint sensitivity) | . Detailed explanations 
of the observations are available in 9, 1 OJ . In the heuristic algorithm we basically estimate removing 
which constraint will reduce the SINF most, then remove that constraint from the infeasible system. 
We keep repeating this step until the system becomes feasible. The set of removed constraints is 
an IIS cover set. 

6 Techniques for This Algorithm 

In our branch and cut algorithm, we first use Chinneck's heuristic algorithm to find a feasible 
solution and set up the upper bound with this solution. Chinneck's heuristic algorithm is very fast 
and accurate. Most of the time this heuristic finds an optimal solution. Hence, we will have a very 
good upper bound at the beginning. The heuristic in [IT] is used, because it is faster according to 
Chinneck. We apply IIS hitting set inequalities as cutting planes for the problem. The cuts are 
problem-specific . 



Basic infeasible subsystem cuts 

The paper [5] describes how to generate a basic infeasible subsystem (BIS) of an infeasible 
system. Given an infeasible system Ax > b where A 6 K™ xd and b S R™, the basic infeasible 
subsystem is an infeasible subsystem of cardinality no more than d + 1 . To find a basic infeasible 
subsystem, the idea is to apply phase 1 of the two phase simplex method [T5] by solving the following 
LP: 

minimize xq 

subject to Ax + x q > b (16) 

After getting the optimal solution, the set of tight constraints corresponds to a basic infeasible 
subsystem of Ax > b. For more details about basic infeasible subsystems, please refer to [5]. A 
basic infeasible subsystem may not be irreducible if it contains a degenerate IIS whose cardinality 
is smaller than d+1. On the other hand, since every IIS is basic, it suffices to find a hitting set for 
the BISs. 



Pseudo-Knapsack Technique for Generating Cuts 

In order to generate cuts that are violated by the current solution of the LP relaxation, we use a 
pseudo-knapsack technique to find as many binary variables as possible with a summation smaller 
than 1 (note that the binary variables will become continuous variables in the LP relaxation, and 
with bounds < xi < 1 for any variable xA. After solving a LP relaxation, the binary variables are 
ranked according their values in increasing order. Select the first k variables (k is maximal) such 
that the summation of them is smaller than 1. Find the IISs in the corresponding constraints (in 



(10)) of these variables. Such an IIS must give a violated cutting plane for the current solution of 
the LP relaxation. 

In fact, identifying the maximum set of binary variables is not a true knapsack problem, because 
in this problem the cost and the value of an item (a binary variable) are the same. The greedy 
method in the above paragraph will give the optimal solution of this pseudo-knapsack problem. 
We can prove this by contradiction. Suppose {a\, 02, ... , a n } is the set of the values of the binary 
variables in increasing order, the greedy method identifies the first k items, and a better algorithm 
identifies a set J of j items (j > k) . The sum of any k + 1 items in J is greater or equal to X)j=i a i > 

because if J contains any items that are different from the items in {«i,«2 «a-+i}> any of those 

different items would be greater or equal to au+i- Hence, the sum of the items in J would be greater 
than 1, noting that 53i=i a* > 1- Therefore, a better algorithm cannot exist. 

This technique is used in one of the two hitting set cut generators we implemented. 



Branching Variable Selecting Rule 



When selecting the branching variable for a subproblem of (111, we mimic the technique in 
Chinneck's heuristic algorithm. Let S\ be the set of constraints of (10 1 that correspond to the 
constraints of the subproblem. After solving an elastic program of Si, we estimate the drop of 
SINF that each constraint can give (see Section [5]) . The constraint b which can give the most 



significant drop has the best chance to be a member of the MIN IIS COVER of (10 1 according to 
Observation 3 in [TT]. The binary variable Sb that corresponds to b is selected as the branching 
variable. 
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Candidate Problem Selection 

By fixing Sb, we get two new candidate problems, one with s& = 1, the other with sj, = 0. In 
the problem selection step, a depth first strategy is used and the problem with s b = 1 is selected 
as the new problem to process. As mentioned in Section |4j fixing the binary variable s& to 1 has 
the effect of removing constraint b from Q. As the algorithm continues to dive in the problem 
tree, ^ will usually become feasible quickly due to the accuracy of Chinneck's algorithm. At that 
point, the candidate problem will be fathomed because the optimal objective value will be 0, an 
integral solution. This strategy will hopefully keep the depth of the search tree small, so then we 
would have a good chance to have small search tree for the whole problem. 



6.1 A Binary Search Strategy 

The idea of fixing the e in Section [4] cannot be used in practice, and we can not guarantee the 
accuracy of the solutions by assigning the e an arbitrary small number. However, we can find an 



accurate solution for a problem by solving several MIPs. In this idea, the MIP (111 needs to be 
changed to the following form: 



minimize 



subject to sj < guess 

3=1 

</,r • SjM > e Vj € {l,2,...,n} (17) 

e > 

a s g {0,1} Vje{l,2,...,n} 

-c< x t <c Vie {1,2,..., d} 

In this formulation e is a variable, and there is also one more constraint in which guess is a value we 
want to test the depth against. If the optimal value of the objective function is 0, guess is smaller 
than the depth of point A p . Therefore, we can use binary search to try different values of guess. 

In the binary search algorithm our branch and cut algorithm is used as a subroutine which solves 
(partially) a MIP per invocation. The subroutine will finish as soon as it finds a feasible solution 
which gives a nonzero e, because a nonzero e implies that guess is no less than the depth of A p . 
The binary search algorithm maintains a cut pool containing the cutting planes generated in the 
early subroutines. The cuts will be used as indexed cuts for later subroutines. 



7 Implementation 

Our algorithm is implemented with the BCP library from the COIN-OR project [T], along with 
the Osi, Clp and Cgl libraries from this project. For the binary search algorithm, we just make 
some adjustments to the branch and cut algorithm, and use it as a subroutine. BCP is a set of C++ 
classes and functions which manage the search tree. It does not contain any LP solver or cutting 
plane generator. The Osi (Open Solver Interface) library is used as the interface between BCP and 
an LP solver. Clp (COIN-OR linear programming) is used as the LP solver in our implementation. 
Some commercial LP solvers, like CPLEX or Xpress, might be faster than Clp, but we want other 
researchers to have easy access to our codes. Of course, it is possible to change the code in order to 
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use other LP solvers thanks to Osi. Cgl (Cut Generation Library) is a collection of cut generators, 
which is used to generate cutting planes for BCP. BCP is designed for parallel execution in the 
master slave paradigm; it also supports sequential execution. 

Our implementation is based on the example BAC [T5] written by Margot. We also implemented 
Chinneck's heuristic algorithm [TT] and two cut generators. The cut generators will receive the 
solutions of an LP relaxation from the LP process and generate cutting planes based on these 
solutions. 

Bremner, Fukuda, and Rosta developed a primal-dual algorithm for the halfspace depth problem 
in [6 . Their algorithm is to find the minimum traversal of all MDSs in the input data set. They 
developed a library to generate the MDSs (recall that MDSs are [the same as] IISs), and that library 
is based on Avis' Lrslib library 0]. We use this MDS generating library to generate IISs for our 
algorithm in the first cut generator. In this cut generator, we use the pseudo-knapsack technique 
in Section [6] to find a set of binary variables. Then we identify the set of points in the input data 
set that correspond to the binary variables. This set of points are then used as the input for the 
MDS generate library to generate a set of IISs. Finally we formulate a set of cutting planes in the 
form of one for each IIS found. 

BCP does not support global cuts currently. Any cuts added to a subproblem are only available 
to its children. This is unfortunate for us, since all hitting set cuts are globally valid. On the other 
hand, keeping too many cuts can slow down each node (Bremner, Fukuda, and Rosta [6] observed 
adding all IIS cuts made solving the LP relaxation very slow). To use the candidate problem 
selection strategy in Section [6j we set the tree search strategy to the depth first search and the 
child preference to dive down. 

8 Computational Experiments 

Our algorithms have been tested on a Myrinct / 4-way cluster that consists of dual socket SunFirc 
x4100 nodes which are populated with 2.6 GHz dual-core Opteron 285 SE processors and 4 GB 
RAM per core. We set the CPU time limit to 60 minutes in these tests. For readability, we can 
not list the raw experimental results, and report only a summary in this section. 

In practice, if the value of the e in the MIP is too small compared with the coefficients of the 
constraints, the linear programming solver would round it to zero. Our remedy is scaling the data 
items, and making the norms similar and relatively small. For a few data sets, the depth values 
reported by our algorithm with different strategies or parameters are different (with a difference of 
1). This could be caused by bugs in our codes or bugs in BCP, but we also suspect this is due to 
some numerical issue. 

8.1 Results for Random Generated Data 

The data sets tested in this section are a subset of the data sets used in [BJ, and they are 
randomly generated. For every data set we compute the depth of the first point, which is the 
origin. For all the tests in this section, the e of the MIP ( [TT] ) is set to 0.00001. Comparing with the 
results of the primal-dual algorithm and the binary search algorithm, the depth values computed 
with our branch and cut algorithm (with BIS cut generator) are accurate. Therefore the e is small 
enough. 
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8.1.1 Comparing Branching Rules 

We first test our algorithm with the first hitting set cut generator in Section [7J the one imple- 
mented with the MDS generating library, and with the greedy branching rules (see Section [6]). We 
generate 10 cuts in one iteration of the LP process. If the objective value is improved by 0.001, the 
LP process will do another iteration. When the MDS cut generator is used, most of the CPU time 
is spent on cutting plane generation. If more cuts are generated in one iteration, the algorithm will 
be slowed down, but will be more memory efficient. In Figure [4] and Figure [5] we compare the per- 
formance of BCP's default strong with the greedy branching rules. The figures show (particularly 
in Figure |5| that strong branching gives a little better performance for most problems, probably 
because less search tree nodes are processed. For many difficult problems, the greedy branching 
works better. In those difficult cases, greedy branching spent much less time on branching, although 
more search tree nodes would be processed. 




dataset (dimension / depth) 



Figure 4: Comparison of different branching rules 



8.1.2 Comparing Cut Generators 

With the BIS cut generator, much less CPU time will be used to generate cuts, and the algorithm 
has better overall performance, although the search tree is larger. The BIS cut generator uses 
floating point arithmetic, the same as the rest of the system. The MDS cut generator uses exact 
arithmetic which is required for Lrslib. This is a factor which slows down the MDS cut generator. 
In fact many cuts are generated repeatedly in the optimization process when the BIS cut generator 
is used. The pseudo-knapsack idea in Section [6] can force the algorithm to generate a different cut 
each time, but the performance turns out to be worse, and more search tree nodes will be processed. 
With the pseudo-knapsack idea, if the algorithm generates a cut with a probability less than 1 on 
each node, the performance will be improved to some extent, although still worse than that without 
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dataset (# of points / depth) 



Figure 5: Comparison of different branching rules 



pseudo-knapsack. We also observe that the pseudo-knapsack idea can make the algorithm faster 
when the greedy branching is applied. This suggests that the pseudo-knapsack idea interferes with 
the strong branching rule. The reason might be that this idea makes the values of the binary 
variables in the solution of LP relaxation closer to each other. 

In Figure [6] and Figure [7] we compare the performance of our algorithm with the two different 
cutting plane generators. The general cut generators in Cgl can barely generate cuts for our 
algorithm, and do not improve the performance. 

8.1.3 Comparing Algorithms 

The binary search algorithm does not perform too badly, but the primal-dual algorithm is very 
slow on some hard problems. In Figure [8] and Figure [9] we compare the performance of the binary 
search algorithm, the primal-dual algorithm, and the branch and cut algorithm. The branch and cut 
algorithm works best most of the time. The performance of the binary search algorithm is actually 
quite fast (as well as being more numerically stable). Sometimes the binary search algorithm even 
works faster than the branch and cut algorithm. The reason is that the MIPs for the binary search 
algorithm are usually easier to solve, and the tricks used in the binary search algorithm also help 
to speed up the algorithm. In contrast, the primal-dual algorithm can be slow on large problems. 

8.1.4 Parallel Execution 

All the above tests are done with the sequential version of our algorithm. Some tests of parallel 
version of the branch and cut algorithm are given in Figure |10| Two data sets are used to test the 
algorithm. The performance with one processor is the performance of the sequential version of the 
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dataset (dimension / depth) 



Figure 6: Comparison of different cutting plane generators 




Figure 7: Comparison of different cutting plane generators 



algorithm. When two processors are applied, one of them is used for the slave process (LP process), 
and when four processors are applied, three of them are used for the slave process. So we expect a 
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Figure 8: Comparison of different algorithms 




dataset (# of points / depth) 



Figure 9: Comparison of different algorithms 



speedup of 3 for four processors, 7 for eight processors, and so forth. The dashed line in the figure 
indicates the linear speedup with respect to number of LP processes. From the figure we can see 
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that the speedup is almost linear. 



points in dimension 5, depth 24 
O 80 points in dimension 10, depth 10 



Performance of the 
parallel execution 




number of processors 



Figure 10: The Performance of Parallel Execution 



8.2 Results for ANOVA Data 

The ANOVA data tested in this section are randomly generated according to the scheme dis- 
cussed in Section [2] There are some duplicated points in every data set. The same duplicated 
data are cither all inside the halfspace or all outside the halfspace when finding the depth of a 
point. Therefore, the binary variables associated with these data will be one or zero simultane- 
ously. When formulating the MIP, we keep only one of the duplicated constraints and assign a 
weight of the number of the duplicated constraints to the associated binary variable in the objec- 
tive function. For every data set, the depth of the origin is computed. Table [I] gives a comparison 
of the performance of our algorithm with different integer program formulations, the simple MIP 
as (11 1 and the weighted MIP as described above. The sequential algorithm is used for these tests. 



The upper bound of the number of duplications (per data point) in the data set is given in the 
third column. From the table we can see that the algorithm using the weighted MIP is much faster, 
because there are many fewer rows in the MIP, and correspondingly fewer binary variables. 



9 Conclusions and Directions for Future Work 



Comparing the branch and cut algorithm with the binary search algorithm and the primal- 
dual algorithm, we conclude that the branch and cut algorithm is the fastest, although with some 
numerical issues. The binary search is slower, but still faster than the primal-dual algorithm and 
more stable. Fast cutting plane generators are important, because the BIS cut generator improves 
the performance dramatically. The strong branching rule is a little faster than the greedy branching 
rule on most of the tests, but the greedy branching rule is faster on many hard problems (i.e. those 
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Point # 


Dimension 


Duplication 


Depth 


Simple MIP 


Weighted MIP 


32 


8 


2 


5 


0.62 


0.19 


32 


8 


2 


10 


5.89 


2.06 


32 


8 


2 


7 


0.90 


0.24 


32 


8 


2 


7 


0.54 


0.31 


32 


8 


2 


4 


0.14 


0.03 


48 


8 


3 


5 


0.11 


0.08 


48 


8 


3 


11 


4.39 


0.82 


48 


8 


3 


10 


2.17 


0.62 


48 


8 


3 


9 


2.04 


0.28 


48 


8 


3 


13 


27.72 


2.00 


64 


8 


4 


11 


1.92 


0.22 


64 


8 


4 


17 


91.89 


2.98 


64 


8 


4 


18 


200.82 


3.48 


64 


8 


4 


15 


30.85 


0.87 


64 


8 


4 


16 


28.94 


1.60 


72 


12 


2 


13 


147.59 


22.19 


72 


12 


2 


18 


807.20 


250.77 


72 


12 


2 


14 


85.94 


33.52 


72 


12 


2 


17 


529.16 


69.92 


72 


12 


2 


20 


outmem 


469.81 


108 


12 


3 


26 


outmem 


519.49 


108 


12 


3 


24 


outmem 


264.20 


108 


12 


3 


24 


outmem 


341.87 


108 


12 


3 


29 


outmem 


1435.35 


108 


12 


3 


22 


outmem 


105.49 


144 


12 


4 


33 


outmem 


1238.99 


144 


12 


4 


39 


outmem 


1760.49 


144 


12 


4 


40 


outmem 


1527.83 


144 


12 


4 


33 


outmem 


544.95 


144 


12 


4 


29 


outtime 


330.57 



Table 1: Performance with different integer program formulations 



with large depth). The branch and cut algorithm has almost linear speed up for parallel execution. 
On ANOVA data sets, the duplicated constraints are removed with the weighted MIP formulation. 
With this modification, the algorithm solved all the problems we tested. 

In some applications, only the median of the data set is interesting. With the current algorithm 
we have to compute the depth of every data item in order to find the median. Finding a fast 
algorithm for computing the median is open for future work. 

The idea for finding a proper e described in Section [4] is not practical. Another open problem is 
a method to find a practical e for MIP (11). It may be possible to solve an MIP based on the strict 
inequalities of system ([7]). Then we do not need to consider e. The binary search algorithm does 
not require a value for e, and it can report a proper value for e. Ironically, this algorithm finds a 
proper value after solving the halfspace depth problem. 



As we noticed in Section 8.1.2 the pseudo-knapsack idea slows down the strong branching when 
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using the BIS cut generator. An idea for reducing redundant cut generation in the BIS cut generator 
that does not interfere with the strong branching would be interesting. 
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